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Abstract. - The human society is a very complex system; still, there are several non-trivial, 
general features. One type of them is the presence of power-law distributed quantities in temporal 
statistics. In this Letter, we focus on the origin of power-laws in rating of movies. We present 
a systematic empirical exploration of the time between two consecutive ratings of movies (the 
interevent time). At an aggregate level, we find a monotonous relation between the activity 
of individuals and the power-law exponent of the interevent-time distribution. At an individual 
level, we observe a heavy-tailed distribution for each user, as well as a negative correlation between 
the activity and the width of the distribution. We support these findings by a similar data set 
from mobile phone text-message communication. Our results demonstrate a significant role of 
the activity of individuals on the society-level patterns of human behavior. We believe this is a 
common character in the interest-driven human dynamics, corresponding to (but different from) 
the universality classes of task-driven dynamics. 
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Introduction. — For decades, the social sciences have 
studied how large-scale patterns of human activity emerge 
ffom the behavior of individuals [1]. Until a decade ago, 
data sets were typically gleaned from questionnaires, ob- 
servational studies, etc.; and understandably rather small. 
Some statistical quantities need very large statistics to be 
seen. One such example is powcr-lavir degree distributions. 
With the development of information (and database) tech- 
nology in the last decade, we can now observe structures 
that require large data sets. One such recently observed 
phenomenon is the power-law distributions of interevent 
times of online activity. This feature can be seen both 
at the level of populations [2-7] and individuals [8-10], 
and cannot be explained by independent, uniformly ran- 
dom, interaction patterns. Understanding such emerging 
communication patterns is essential to be able to predict 
the impact of new technologies, the spread of computer 
viruses [11,12], human travel [13], etc. 

How do power-laws in response, or interevent, times 
occur? In a pioneering work, Barabasi [8] proposed 
a queuing model as explanation (later solved analyti- 



cally [9, 14, 15]). In this model, the power-law statistics 
does not come from a power-law distributed trait of the 
agents, but emerge from interaction between the agents 
and the environment. Barabasi's model gives response 
times of two universality classes — one with power-law ex- 
ponent a = 1 (observed in e-mail communication [8, 16]), 
and a class with a = 1.5 (observed in surface mail com- 
munication [17]). The behavioral origin of power-law tails 
according to Barabasi's model [8], is that the individuals 
use a highest-priority-first (HPF) protocol to decide which 
task needs to be executed first (rather than a first-in-first- 
out strategy). However, power-laws have been observed 
in systems driven by individuals arguably not guided by 
task-lists (e.g., web browsing [10], networked games [18] 
and online chatting [19]). In this work, we perform a de- 
tailed study of such a system, namely an online infrastruc- 
ture for rating movies. Our primary quantity is the time t 
between two consecutive movie ratings. The distribution 
p{t) of the aggregated data follows a power law spanning 
more than two orders of magnitude. More interestingly, 
we observe a monotonous relation between the power-law 
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Fig. 1: The distribution of interevent time in the population 
level, indicating that p{r) ~ r~'^ °*. The solid line in the log- 
log plot has slope —2.08. The data exhibits weekly oscillations, 
reflecting a weekly periodicity of human behavior, which has 
also been observed in e-mail communication [20]. 

exponent and the mean activity in the group (see below 
how to divide the whole population into several groups). 
This suggests that the activity of individuals is one of the 
key ingredients determining the distribution of interevent 
times. 

Data source. — Our data source, obtained from 
www.netflixprize.com, is collected by a large American 
company for mail order DVD-rentals, Netflix. The users 
can rate movies online. This information is used to give 
the users personalized recommendations. The data was 
made public as a part of a competition for the bet- 
ter recommender system. In total, the data comprises 
M = 17,770 movies, N = 447,139 users and - 9.67 x 10^ 
records. Each record consists of four elements: a user ID 
i, a movie ID a, the user's rating (from 1 to 5) Via, and 
the time of the rating tia). Tracking the records of a given 
user i, one can get ki — 1 interevent times where ki is the 
number of movies i has already seen. The time resolution 
of the data is one day. 

Interevent time distribution for the whole pop- 
ulation. — In Fig. 1, we report the interevent time dis- 
tribution based on the aggregated data of all users. The 
distribution follows a power law, p(t) ~ t"''', for more 
than two orders of magnitude. The power-law exponent, 
7 « 2.08, is obtained by maximum likelihood estima- 
tion [21]. All the power-law exponents reported in this 
Letter are obtained by this method. To avoid bias from 
the mentioned oscillation effect, at the whole-population 
level, we only include the data points separated by one 
week. That is to say, in the calculation of the power- 
law exponent, only the data points F{1), i^(14), i^(21), • • • 
are considered, where F{t) denotes the frequency of in- 
terevent time T. A proposed mechanism for the emer- 
gence of power-law distributions with 7 « 2.0 is aggrega- 
tion of Poissonian distributions with different, uniformly 




Fig. 2: The typical distributions of interevent times at a group 
level — group 4 (upper panel) and group 17 (lower panel). The 
solid lines in the log-log plot have slopes —2.41 and —1.71, 
respectively. The corresponding mean activities are 1.274 and 
0.112. 

distributed, characteristic times [22]. However, as we will 
see later, the empirical statistics and analysis at group and 
individual levels demonstrate that this scaling law cannot 
be caused by a combination of Poissonian agents. 

Interevent time distribution for groups. The 

HPF protocol [8] explains heavy tails in response times of 
human communication. Nevertheless, we lack an in-depth 
understanding of the interevent time distribution in data 
sets such as ours. We can probably not explain the aggre- 
gated distribution by identical behavior. A heavy smoker, 
consuming fifty cigarettes per day, would not make a long 
pause. Events separated by longer times would (assuming 
smoking patterns follows the same statistics) come from 
other people — occasional party-smokers, mischievous ado- 
lescents, or similar. Similarly, the other end of the spec- 
trum in Fig. 1 probably corresponds to other persons. To 
get at this we measure the activity Ai [23] — the frequency 
of events of an individual: Ai = m/Ti, where rii is the 
total number of records of i, and Ti is the time between 
the first and the last event of «. In other words, Ai is the 
frequency of movie ratings of i. As shown in Fig. 1, the 
mean activity, averaged over all users, is {A) — 0.812. 

To investigate the role of activity, we sort the users by 
activity in a descending order, and then divide this list into 
twenty groups, each of which has almost the same num- 
ber of users. Accordingly, the mean activity of each group 
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Fig. 3: The relation between power-law exponent 7 of in- 
terevent time distribution and mean activity of each group. 
Each point corresponds to one group. All the exponents are 
obtained by using maximum likelihood estimation and pass the 
Kolmogorov-Smirnov test with threshold quantile 0.9 [21]. 

obeys the inequality {A)i > {A)2 > • • • > {A}2o- In Fig. 2, 
we report two typical distributions of interevent time at 
a group level. Both these distributions follow power-laws. 
Note that the group with lower activity has power-law 
exponent, giving a longer average interevent time. The 
corresponding distributions for the other groups follow 
power-law forms as well, but with different exponents. In 
Figure 3 we diagram the exponent as a function of ac- 
tivity. There is a non-trivial, monotonous increase of the 
exponent with the activity. This relation, in accordance 
with our smoker example above, indicates the significant 
role of activity for the observed, aggregate behavior. Note 
that, for a mathematically ideal power-law distribution 
p(t) ~ T"'', the exponent 7 has a one-to-one correspon- 
dence with A from the relation 

7(^) = 1 + Y^ , 0<A<1 . (1) 

For A > 1, there is no corresponding normalized proba- 
bility distribution, of t, of a power-law form. However, 
the situation in the real data is very different. As shown 
in Figs. 1 and 2. the activity are mainly determined by 
the drooping head of p(t), not the tail used to calculate 
7 (we consider r = 7, 14, 21, • • • only). A similar case can 
be found in [8] and its supplementaries, where a peak at 
p{t = 1), which was ignored in the calculation of 7, mainly 
describes the individual activity. 

If every monitored individual has a Poisson distributed 
activity at separate rate A, then the distribution of in- 
terevent time should be [22] 

p(r) ~ f{A)r-' , (2) 

where f{A) is the activity distribution of individuals. 
Since the power-law exponent in population level is close 
to 2, if it results from an aggregation of Poissonian indi- 
viduals, the activity distribution should follow a uniform 
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Fig. 4: Cumulative distribution of activities for all the indi- 
viduals. The distribution is intermediate between exponential 
and power-law. The insets display the same measure for group 
4 and group 17, respectively. 

pattern. However, as shown in the main plot of Fig. 4, the 
activity distribution in population level is not uniform. In 
contrast, as reported in the insets of Fig. 4, the cumulative 
distribution F(A) for group 4 and group 17 can be well 
fitted by a straight line, suggesting a uniform distribution 
/(j4), while the exponents 74 and 717 are far from each 
other, and both different from 2. Therefore, the heavy- 
tailed nature at the group level cannot originate from ho- 
mogeneous Poissonian individuals. To our knowledge, it is 
the first time one has observed, a monotonous relation be- 
tween power-law exponent of interevent time distribution 
and a certain measure (i.e. activity). We believe this anal- 
ysis illustrate the important role of the individual activity 
in the aggregate pattern of human behavior. 

Interevent time distribution for individuals. 

To continue tying together micro- and macro phenomena, 
we look closer at the behavior of individual agents. In par- 
ticular, we investigate whether or not the monotonous re- 
lation between activity and power-law exponent also holds 
at an individual level. 

Figs. 5(a) and (b) report the interevent time distribu- 
tion p(t) of two individual users. We observe a similar 
relation as for the group level statistics. That is to say, 
the less active agent has a broader distribution and smaller 
power-law exponent. Although the distributions shown in 
Figs. 5(a) and (b) show heavy-tailed forms, they do not 
pass the Kolmogorov-Smirnov test with threshold quantile 
0.9 [21]. We believe this can be explained by the relative 
short sample times of the individual records. (The typical 
duration of individual records, in our case, range from a 
few months to a few years. This range is not as impressive 
as, e.g. Refs. [17,24] where surface mail is studied for a pe- 
riod of more than half century with a resolution in days.) 
It may be the case that a credible power-law scaling will 
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Fig. 5: (Color online) The interevent time distribution be- 
tween, (a)-(b) two consecutive movie ratings by two Netflix 
users, and (c)-(d) two consecutive sending of text-messages by 
two mobile telephone users. The time unit for (a) and (b) 
is one day, and for (c) and (d) one hour. Under the thresh- 
old quantile 0.9, distributions in (a) and (b) can not pass the 
Kolmogorov-Smirnov test, while the (c) and (d) do pass it. 

emerge after a sufficient while; however, so far, we cannot 
claim that typical r-distributions follow power-law forms. 
Nevertheless, almost every user has a heavy-tailed distri- 
bution (that is, much broader than a Poisson distribution 
with the same average interevent time (r)). We use the 
second moment, (r^) = / t^p(t) dr, to measure the width 
of p(r). As seen in Fig. 6, all individual distributions have 
much larger (r^) than the Poisson distributions with the 
same (r). Moreover, we observe a negative correlation be- 
tween (t^) and A, which can be seen as an individual-level 
variant of the relation in Fig. 3. Although the negative 
correlation can also be detected in Poisson distributions, 
this finding is interesting since it highlights the activity, 
as opposed to universality classes, as a signifier of human 
dynamics. 

To check the generality of our observations of the re- 
lation between activity and interevent time patterns, we 
investigate another empirical data set of mobile phone 
text-message communication. The data set comprise all 
messages sent and received by 20 users over half a year. 
Figure 5(c) and (d) report two typical interevent time 
distributions. These show yet more credible power-laws 
than those in the Netflix data (Fig. 5(a) and (b)). Ac- 
tually, in this data set, all users show a power-law dis- 
tribution passing the Kolmogorov-Smirnov test. (Note 
that, the time resolution of the text-message data is sec- 
onds. Thus, half a year is long compared to the Netflix 
data.) The activities and exponents belong to the inter- 
vals A g [6.09,60.72] and 7 € [1.41,2.25]. Even at the 
individual level (which is sensitive to fluctuations in per- 
sonal habits), an almost monotonous relation between A 
and 7 is observed (with the exception of two users that 




Activity 

Fig. 6: (Color online) Scatter plot showing the second moment 
(r^) and activity, indicating a negative correlation. The red 
curve shows the average value of (r^) for a given activity, and 
the blue curve represents the case of Poisson distribution whose 
expected value is given as the inverse of activity. 

show a slight deviation). A similar relation can also be 
found in data of online Go (duiyi.sports.tom.com); in this 
data the individual records span years, and the resolution 
is hours). Here, the more active players also have larger 
power-law exponents and narrower interevent time distri- 
butions. However, for commercial reasons, the aggregated 
data cannot be freely downloaded. Therefore, for the text- 
message and online Go data we cannot analyze the aggre- 
gate level statistics. 

Conclusions. — In previous works, the heavy-tailed 
interevent time distribution has been explained by a queu- 
ing mechanism in the decision making of agents. This 
is a relevant scenario for task-driven situations (such as 
e-mail [8] or surface mail [17] communication). How- 
ever, similar, heavy-tailed distributions also exists in many 
interest-driven systems (e.g. web browsing [10], networked 
computer games [18], online chat [19]; or, as our examples, 
text- message sending, and movie rating), where no tasks 
are waiting to be executed. As opposed to focusing on 
universality classes (as for task-driven systems), we high- 
light a common character in interest-driven systems: the 
power-law exponents are variable in a wide range with a 
strongly positive correlation to the individual's activity. 
This finding is helpful for further understanding the un- 
derlying origins of heavy tails of interest-driven systems. 
A power-law distribution of activity, might also be a fac- 
tor in the dynamics of task-driven systems. This is remi- 
niscent of the power-law distribution of extinction events 
(that can be explained by both the internal dynamics of 
evolution, and a power-law distribution of the magnitudes 
of natural disasters [25]). 
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